Design Time is Reduced When Emulating a 5-Million Gate-Equivalent ASIC

Editorial
Today's News
News Archives
On-line Articles
Current Issue
Magazine Archives
Subscribe to ISD

Directories:
Vendor Guide 2001
Advertiser Index
Event Calendar

Resources:
Resources and Seminars
Special Sections

Information:
2001 Media Kit
About isdmag.com
Writers Wanted!
Search isdmag.com
Contact Us

Design Time is Reduced When Emulating a 5-Million Gate-Equivalent ASIC

By Terry Thompson
Integrated System Design
Posted 07/11/01, 02:58:06 PM EDT

Chrysalis Semiconductor,a division of Chrysalis-ITS, is designing network security protocol processing ICs that, with associated software and printed-circuit boards, are enablers for e-commerce and for highly specialized transaction and communication-based markets. Chrysalis-ITS designs boththe hardware and software for these systems.

The first in a series of Chrysalis-ITS network security processors, the Luna 340 was fully emulated on the Axis Xtreme system to enable system verification and to accelerate software development and system tuning. Not only did we get the improved performance we sought, but we also reduced design time for subsequent ICs. In fact, we were able to let our software designers start working on the emulated version of the design four months before we got silicon prototypes back.

The Luna 340 utilizes multiple embedded RISC cores to create a complete SoC providing both asymmetric and symmetric cryptographic processing as well as associated network security protocol processing. The Luna 341 is a companion processor (Fig.1). Its high-speed modular exponentiation, encryption and hashing acceleration hardware is designed to work seamlessly with the Luna 340, providing acceleration of SSL-related cryptographic operations (RC4,MD5,SHA-1) via a 32-bit, 66-MHz PCI interface.

The Luna 340,a 5M-gate-equivalent ASIC design, was designed in partnership with Mosaid Technologies. It is more than 22 million transistors implemented in 0.25-micron technology. The Luna 340 contains five ARC RISC cores with associated memory, two PCI 66-MHz interfaces and a complex (256-bit)multiplier. The first spin of the device led to functional silicon, although the performance fell short of expectations. So two areas were identified for improvement.

First was the provision of a better hardware/software co-design environment; that's where Axis Systems' emulation capability came in. Second, as the security market becomes better defined, it is possible to implement dedicated hardware blocks instead of a microprocessor-based solution. The Luna 341, the first Chrysalis-ITS chip to utilize such blocks, uses a platform-based SoC design approach to produce a hierarchical, scalable,modular design that is a million-gate-equivalent ASIC. The encryption engines for modular exponentiation -- the RC4,MD5 and SHA-1 -- were all developed by Chrysalis-ITS 'IC team. Use of multiple clock domains allowed the team to achieve maximum performance for each encryption engine. The Luna 341 was fully prototyped on multiple FPGAs and is currently being converted to an ASIC.

Because the Luna 340 is processor-based, there is a significant amount of software to design and verify, including drivers, application code and more than 128 kbytes of embedded firmware. The Luna 340 control processor receives commands from the host and dispatches the tasks to its own symmetric processors or the Luna 341's crypto-accelerator. These commands control high-level cryptographic operations flexibly and programmably.

Our verification tool set included VHDL and Verilog simulators and software development tools from ARC, Axis Systems, Cadence and Mentor Graphics. The logic simulators lacked the performance we needed to do system-level verification, so we added Axis' hardware system, based upon reconfigurable computing (RCC) technology. With performance that is orders of magnitude faster than that of standard logic simulation, it is the cornerstone of our verification effort. The RCC system is used by the entire design team.

RCC technology uses programmable devices to enable simulation acceleration, system emulation and hardware/software co-verification on a single platform. It requires only one design database. You can set up the RCC system to accelerate logic simulation, or you can use the emulation capability to make it look like the device and drive it from the real target system.

To our hardware engineers,the RCC system works like the logic simulator to which they are accustomed (Fig.2,page 42). Because it can instantly swap from software-based logic simulation to simulation acceleration and system emulation, our engineers were able to emulate at a very high speed and then swap to logic simulation mode for full debugging capabilities. To our software designers,it looks like the actual hardware is back from the fab. The system allows them to work in their preferred host environment. They can work with their standard development tools, includ- ing instruction set simulators and source-level debuggers, in- stead of working with waveforms and large log files from logic simulators. Software engineers can even single-step through their code, observing what happens to the registers with each line of code.

The RCC technology accelerates logic simulation by utilizing programmed coprocessors, where each processor is designed for a single task. The number and type of processors selected during design compilation depend on the design description. Then these hundreds of thousands of processors are mapped into FPGAs and executed using an event-driven parallel-computing architecture with a highly efficient communication algorithm.

When we started development on the Luna 340 network security chip,we did not yet have RCC emulation. The Luna 340 was designed using a combination of Verilog and VHDL. When we brought the Axis RCC system in at the end of the design process for the Luna 340B, we had already synthesized the design to gates and were easily able to run the synthesized gate-level representation of the design on the RCC system.Within one week of taking delivery of the unit, we were modeling the Luna 340B on the box. We used the RCC system to verify the re-spin and to provide a model to exercise the Luna 341. Tape-out on the Luna 340B was a success.

For the Luna 341 design, we used Verilog. That allowed us to run and debug the design on the Axis box in RTL code. The RCC system supports behavioral, RTL and gate-level descriptions.

Verification challenges

We deliver a considerable amount of specialized software with our devices, and it is a major component in the overall system. With such a large software component, it is critical to use a verification environment that offers a cycle-accurate representation of the device and supports software bring-up. We chose to use the ARC cores for this chip set because of the level of customization they permitted. Since the cores are extensible and the customers usually make significant modifications, the vendors could not easily provide cycle-accurate models. So without an easy path to emulation, we were forced to make a trade-off between adding extra functionality to get increased performance and having a good hardware/software co-design environment.

When we added the RCC system,we had a simple means to use our gate-level design to provide a cycle-accurate model for the software engineers. So in addition to being able to verify functionality, we were now able to identify prob- lems such as contention and wait states.

Software written to complete a function in 5,000 clocks could turn out to take 8,000 clock cycles because of contention for resources. If the IC model is not cycle-accurate, it 's very difficult to identify the wait states and hence predict what the performance will be. The RCC system is ideal for this situation. Since our five ARC cores were synthesizable and we had netlists, we simply ran the design on the RCC system;we no longer needed vendor-supplied models. When we run the system in emulation mode, it becomes the cycle-accurate model we are looking for, and we can run actual software on it. Because the Axis system is both a simulation accelerator and an emulator, it really is much more than just a cycle-accurate model. It provides a complete simulation and debug environment that is useful for both hardware and software engineers.

The other major challenge we faced was the lack of raw simulation performance. With logic simulation, we simply didn't have the throughput we needed to adequately test the system. But the RCC system offered significantly enhanced performance (see table).

For one particular test, the hardware engineers took a logic simulation that required two days to run and ported it to the RCC system. Running in simulation acceleration mode, the same test took only 40 seconds. That raw performance allowed the team to add more tests to verify much more complex behaviors than we originally could do. This is important when dealing with a multichip system.

Co-verification

Our design process has recently evolved to include hardware/software co-verification.Because we were able to run cycle-accurate models of the hardware, we were able to dramati- cally increase the confidence in the entire design, both the hardware and software components. On the original design of the Luna 340,hardware/software co-verification was not used. The design team was able to achieve three levels of coverage on the software code. The designers loaded the device drivers,such as the PCI driver and the firmware, and were then able to add the application code on top. When we put the whole system together it worked, but not as fast as we thought it would, as was discussed earlier.

When the design team ran real customer software through the RCC emulator, we found that with software-implementation changes, we could reduce the number of cycles needed for various tasks. We were also able to identify the bottlenecks in the hardware, which helped us to focus our design effort.

That became evident in the design of the companion chip, where complex instructions that had taken 350 cycles in the first revision of the design could now be completed in 200 cycles.

With the cycle-accurate model of the ARC, our designers can get a feel for no-op states, where the core is trying to do two things at once. The core may need to finish one activity before it gets to another because of resource contention. When we determine why it is calling on one device too often, we can change the software to avoid that. We run the software again and know for sure that the contention was removed.

The RCC system becomes an absolutely accurate model of our chip. The system is then connected to the board with a cable, so that e have a model for the entire system. We simply run the software on that. Some of the software may be running on the RCC system, some on the microprocessor on the board, or maybe a combination of the two. Since the hardware is still an RTL description, it is straightforward to make hardware modifications based on the results. That permits true trade-off analysis and lets us change the hardware or software to maximize system performance.

The biggest gain we see in our approach to hardware/software co-verification is having the software developed using known-good hardware. When we started the Luna 341 design, the RCC system was used to emulate the Luna 340B and exercise the software while the Luna 340B was being manufactured.That gave us a four-month head start on the software development cycle for the Luna 341;e didn't have to wait four months for actual silicon.

Hardware/software co-verification increases our confidence that a re-spin won't be needed from 30 percent to more than 90 percent, which potentially represents a big savings. Saving a re-spin on 0.15-micron technology is equivalent to saving close to $1 million, not to mention the four months needed to fabricate a design.

An additional benefit e found in using the RCC system for hardware/software co-verification was the ability to force and release nodes as in a software environment. We used this to simulate sequences such as power-up. In security devices, power-up sequences are very extensive, since it is important to ensure that sensitive information is not inadvertently disclosed. We also added behavioral code in the form of checkers and watchers. That allowed us to display messages such as �Chip has been cleared� to the Unix command line when a specific sequence of events triggered the checker. We achieved much improved visibility into the operation of the system with these features.

RCC delivers the performance necessary for true hardware/software co-verification, which enables early testing of embedded software before prototypes are available. It augments the verification done by hardware engineers,leading to shorter project schedules and increased confidence in the completed design.

Debugging on an emulator

Most emulators lack advanced hardware debugging capabilities such as checkpointing. They cannot return to any place in simulation time and continue interactive debugging. They also synthesize to gate-level representations, introducing ambiguity that is hard to overcome in the debugging process.

RCC-based emulation does not have any of these limitations. It implements RTL directly on the programmed coprocessors, so hardware verification engineers are debugging the code they wrote, not a synthesized representation of it. Because of the tight integration between a software-based logic simulator and RCC, the engineers can instantaneously swap between emulation and logic simulation for unparalleled debug capabilities. Designers can return to any point in the simulation to view and debug the design.

Our hardware verification engineers are finding that the approach has huge advantages over traditional emulation solutions. They run regressions nightly and examine the results every morning. Since the logic simulation won't tie up the emulator, we continue to run real test cases while designers debug the previous night 's results.

The setup was simple: It took less than one day to get the Luna 341 design up and running on the RCC system. And the compile time was relatively short: It took one hour to compile the entire system description into the box. Unlike most emulators, the RCC system does not require the use of a logic analyzer, further simplifying the setup requirements. Instead of setting probes to trigger at certain events, engineers can specify the time period and depth of what they would like to see in the VCD results.

Increases in run-time performance enabled our designers to complete additional testing. The design team was thrilled that these great gains in performance did not come at the expense of debug capabilities.

The results

The Luna 340,a 5M-gate-equivalent ASIC design, was fully emulated on the Axis Xtreme system to enable system verification and to accelerate software development and system tuning.

Not only were we able to improve system perfor- mance through emulation-based HW/SW co-verification, e also ere able to shorten our design cycle. We ere able to let our software designers start working on the emulated version of the design four months before e got silicon prototypes back.

Now that we're at 0.15 micron,one iteration of silicon would cost us four months and $1 million. Emulation isn't optional anymore.The emulator is running all the time, and there's a waiting list to use it.

Print this story Send as e-mail Back Home

Sponsor Links